Nemesis: Neural Mean Teacher Learning-Based Emotion-Centric Speaker

نویسندگان

چکیده

Image captioning is the multi-modal task of automatically describing a digital image based on its contents and their semantic relationship. This research area has gained increasing popularity over past few years; however, most previous studies have been focused purely objective content-based descriptions scenes. In this study, efforts made to generate more engaging captions by leveraging human-like emotional responses. To achieve task, mean teacher learning-based method applied recently introduced ArtEmis dataset. first large-scale dataset for emotion-centric captioning, containing 455K 80K artworks from WikiArt. includes self-distillation relationship between memory-augmented language models with meshed connectivity. These are trained in cross-entropy phase then fine-tuned self-critical sequence training phase. According various popular natural processing metrics, such as BLEU, METEOR, ROUGE-L, CIDEr, our proposed model obtained new state art ArtEmis.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speaker Change Detection based on Mean Shift

To settle out the problem that search of speaker change point (SCP) is blind and exhaustive, mean shift is proposed to seek SCP by estimating the kernel density of speech stream in this paper. It contains three steps: seeking peak points using mean shift firstly, using maximum likelihood ratio (MLR) to compute the MLR value of the peak points secondly, and seeking SCPs from MLR value using the ...

متن کامل

Cost-Sensitive Learning for Emotion Robust Speaker Recognition

In the field of information security, voice is one of the most important parts in biometrics. Especially, with the development of voice communication through the Internet or telephone system, huge voice data resources are accessed. In speaker recognition, voiceprint can be applied as the unique password for the user to prove his/her identity. However, speech with various emotions can cause an u...

متن کامل

Emotion-Based Reinforcement Learning

Studies have shown that counterfactual reasoning can shape human decisions. However, there is a gap in the literature between counterfactual choices in description-based and experience-based paradigms. While studies using descriptionbased paradigms suggest participants maximize expected subjective emotion, studies using experience-based paradigms assume that participants learn the values of opt...

متن کامل

Singing speaker clustering based on subspace learning in the GMM mean supervector space

In this study, we propose algorithms based on subspace learning in the GMM mean supervector space to improve performance of speaker clustering with speech from both reading and singing. As a speaking style, singing introduces changes in the time-frequency structure of a speaker’s voice. The purpose of this study is to introduce advancements for speech systems such as speech indexing and retriev...

متن کامل

Speaker Characteristics and Emotion Classification

In this paper, we address the — interrelated — problems of speaker characteristics (personalization) and suboptimal performance of emotion classification in state-of-the-art modules from two different points of view: first, we focus on a specific phenomenon (irregular phonation or laryngealization) and argue that its inherent multi-functionality and speaker-dependency makes its use as feature i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Algorithms

سال: 2023

ISSN: ['1999-4893']

DOI: https://doi.org/10.3390/a16020097